395 research outputs found

    Benchmarking some Portuguese S&T system research units: 2nd Edition

    Full text link
    The increasing use of productivity and impact metrics for evaluation and comparison, not only of individual researchers but also of institutions, universities and even countries, has prompted the development of bibliometrics. Currently, metrics are becoming widely accepted as an easy and balanced way to assist the peer review and evaluation of scientists and/or research units, provided they have adequate precision and recall. This paper presents a benchmarking study of a selected list of representative Portuguese research units, based on a fairly complete set of parameters: bibliometric parameters, number of competitive projects and number of PhDs produced. The study aimed at collecting productivity and impact data from the selected research units in comparable conditions i.e., using objective metrics based on public information, retrievable on-line and/or from official sources and thus verifiable and repeatable. The study has thus focused on the activity of the 2003-06 period, where such data was available from the latest official evaluation. The main advantage of our study was the application of automatic tools, achieving relevant results at a reduced cost. Moreover, the results over the selected units suggest that this kind of analyses will be very useful to benchmark scientific productivity and impact, and assist peer review.Comment: 26 pages, 20 figures F. Couto, D. Faria, B. Tavares, P. Gon\c{c}alves, and P. Verissimo, Benchmarking some portuguese S\&T system research units: 2nd edition, DI/FCUL TR 13-03, Department of Informatics, University of Lisbon, February 201

    Disjunctive shared information between ontology concepts: application to Gene Ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures.</p> <p>Results</p> <p>This paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors.</p> <p>Conclusions</p> <p>DiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.</p

    Semantic Similarity in Cheminformatics

    Get PDF
    Similarity in chemistry has been applied to a variety of problems: to predict biochemical properties of molecules, to disambiguate chemical compound references in natural language, to understand the evolution of metabolic pathways, to predict drug-drug interactions, to predict therapeutic substitution of antibiotics, to estimate whether a compound is harmful, etc. While measures of similarity have been created that make use of the structural properties of the molecules, some ontologies (the Chemical Entities of Biological Interest (ChEBI) being one of the most relevant) capture chemistry knowledge in machine-readable formats and can be used to improve our notions of molecular similarity. Ontologies in the biomedical domain have been extensively used to compare entities of biological interest, a technique known as ontology-based semantic similarity. This has been applied to various biologically relevant entities, such as genes, proteins, diseases, and anatomical structures, as well as in the chemical domain. This chapter introduces the fundamental concepts of ontology-based semantic similarity, its application in cheminformatics, its relevance in previous studies, and future potential. It also discusses the existing challenges in this area, tracing a parallel with other domains, particularly genomics, where this technique has been used more often and for longer

    Complex associations between genetic variants and clinical profiles in autism spectrum disorder patients: an integrative systems biology approach

    Get PDF
    A complexidade genética e clínica que caracterizam a per turbação do espetro do autismo (PEA) têm limitado o desenvolvimento de biomarcadores que permitam um diagnóstico precoce e um prognóstico fiável, assim como uma abordagem personalizada para a inter venção terapêutica. Neste estudo pretendeu-se desenvolver uma abordagem integrativa para predição da apresentação clínica baseada em informação de variantes genéticas (Copy Number Variants, CNVs), com aplicação clínica no diagnóstico e prognóstico na PEA. Para tal, técnicas de aprendizagem automática (machine learning) foram aplicadas a dados clínicos e genéticos de 2446 doentes com PEA, recrutados no âmbito do consórcio Autism Genome Project. Análise de clustering de dados clínicos multidimensionais definiu, nesta população, dois subgrupos de pacientes com per fis clínicos diferindo significativamente em termos de capacidade verbal, nível cognitivo, gravidade da doença e compor tamento adaptativo. A análise dos CNVs que afetam especificamente genes do cérebro, nos mesmos indivíduos, identificou 15 processos biológicos enriquecidos em genes alterados. A aplicação de um algoritmo de machine learning para classificação dos doentes com apresentação clínica mais disfuncional, com base nos processos biológicos alterados, mostrou que correlações entre fenótipo clínico e biologia subjacente são possíveis na PEA e que, para grupos populacionais com dados informativos, existe um poder preditivo razoável. Para implementação deste conceito na prática clínica serão necessários estudos mais alargados com dados clínicos e genómicos mais completos.The genetic and clinical complexity that characterize Autism Spectrum Disorder (ASD) has hindered the development of biomarkers for early diagnosis and reliable prognosis, as well as a personalized to therapeutic inter vention. This study aimed to develop an integrative approach for clinical presentation prediction based on Copy Number Variants (CNVs), with clinical application for diagnosis and prognosis of ASD. For this purpose, machine learning techniques were applied to a dataset of 2446 patients with ASD, recruited by the Autism Genome Project. Clustering analysis of multidimensional clinical data allowed the definition of two patient subgroups in this population, with clinical profiles dif fering significantly in verbal ability, cognitive level, disease severity and adaptive behavior. In the same subjects, analysis of CNVs specifically af fecting brain-expressed genes identified 15 biological processes enriched for the disrupted genes. A machine learning algorithm was trained and tested to classif y patients with more dysfunctional clinical presentation based on altered biological processes. The results showed that correlations between clinical phenotype and underlying biology can be established in ASD and that, for datasets with suf ficiently informative data, there is a reasonable predictive power. Fur ther studies with more complete clinical and genomic data are needed to implement this concept in clinical practice.info:eu-repo/semantics/publishedVersio

    A Silver Standard Corpus of Human Phenotype-Gene Relations

    Full text link
    Human phenotype-gene relations are fundamental to fully understand the origin of some phenotypic abnormalities and their associated diseases. Biomedical literature is the most comprehensive source of these relations, however, we need Relation Extraction tools to automatically recognize them. Most of these tools require an annotated corpus and to the best of our knowledge, there is no corpus available annotated with human phenotype-gene relations. This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. The corpus consists of 1712 abstracts, 5676 human phenotype annotations, 13835 gene annotations, and 4283 relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%. By using the corpus we were able to obtain promising results with two state-of-the-art deep learning tools, namely 78.05% of precision. The PGR corpus was made publicly available to the research community.Comment: NAACL 201

    Exploiting disjointness axioms to improve semantic similarity measures

    Get PDF
    Motivation: Representing domain knowledge in biology has traditionally been accomplished by creating simple hierarchies of classes with textual annotations. Recently, expressive ontology languages, such as Web Ontology Language, have become more widely adopted, supporting axioms that express logical relationships other than class-subclass, e.g. disjointness. This is improving the coverage and validity of the knowledge contained in biological ontologies. However, current semantic tools still need to adapt to this more expressive information. In this article, we propose a method to integrate disjointness axioms, which are being incorporated in real-world ontologies, such as the Gene Ontology and the chemical entities of biological interest ontology, into semantic similarity, the measure that estimates the closeness in meaning between classes. Results: We present a modification of the measure of shared information content, which extends the base measure to allow the incorporation of disjointness information. To evaluate our approach, we applied it to several randomly selected datasets extracted from the chemical entities of biological interest ontology. In 93.8% of these datasets, our measure performed better than the base measure of shared information content. This supports the idea that semantic similarity is more accurate if it extends beyond the hierarchy of classes of the ontology. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
    corecore